Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter by slow-J · Pull Request #16268 · apache/lucene

slow-J · 2026-06-17T16:03:12Z

Resolves #16249

Implementation heavily inspired by HistogramCollector.java.

Range faceting (in the sandbox module -LongRangeFacetCutter) currently reads the doc-values value for every matching document and binary-searches it into an elementary interval. When the faceted field is single-valued, we can use a doc-values skip index. For a dense skip block whose min and max values fall into the same elementary interval, every document in that block maps to that interval, allowing us to skip the per-doc value lookup and binary search.

Limitation - applies to single-valued, long fields only.

Benchmark (luceneutil)

I used my branch of https://github.com/slow-J/luceneutil/tree/github-16249-range-facet-bench which cherry picked 2 of @epotyom 's commits (mainly mikemccand/luceneutil#582 which adds range-facet support)

Setup:
runlocal.py, wikimediumall (33.3M docs), index-sorted by lastMod_skipper with
addDVSkippers=true. baseline = main, candidate = this change, both DURING_COLLECTION, so
the only difference is this optimization. 30 JVM iterations.

Command: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run-timing7.txt"

Edit: new benchmark results after the changes for Egors first 2 comments.
Edit2: new benchmark results after unwrapping removed

QPS

Task	QPS baseline	StdDev	QPS modified	StdDev	Pct diff	p-value
BrowseLastModOvlpRangeFacets	1.26	(7.7%)	2.72	(10.6%)	115.5% (90% - 145%)	0.000
BrowseLastModRangeFacets	2.21	(6.0%)	3.31	(8.8%)	50.0% (33% - 68%)	0.000
MedTermLastModOvlpRangeFacets	3.82	(13.5%)	5.48	(5.7%)	43.5% (21% - 72%)	0.000
MedTermLastModRangeFacets	4.15	(13.6%)	5.26	(7.9%)	26.5% (4% - 55%)	0.000
BrowseIDOvlpRangeFacets	1.21	(6.6%)	1.10	(6.7%)	-9.6% (-21% - 4%)	0.000
BrowseIDRangeFacets	2.33	(8.6%)	2.57	(5.1%)	10.1% (-3% - 26%)	0.000
MedTermIDOvlpRangeFacets	3.79	(13.5%)	4.61	(11.1%)	21.6% (-2% - 53%)	0.000
MedTermIDRangeFacets	5.98	(4.6%)	5.92	(2.7%)	-0.9% (-7% - 6%)	0.340

Latency (ms) — aggregated across all iterations

Task	P50 B	P50 C	Diff	P90 B	P90 C	Diff	P99 B	P99 C	Diff	P999 B	P999 C	Diff	P100 B	P100 C	Diff
BrowseLastModOvlpRangeFacets	844.184	386.006	-54.3%	1437.289	581.094	-59.6%	7523.983	828.460	-89.0%	9510.480	868.764	-90.9%	9555.393	888.500	-90.7%
BrowseLastModRangeFacets	474.762	319.836	-32.6%	854.574	546.789	-36.0%	4412.829	781.421	-82.3%	7775.105	862.760	-88.9%	7910.258	893.449	-88.7%
MedTermLastModOvlpRangeFacets	286.226	187.654	-34.4%	552.668	436.448	-21.0%	771.820	599.881	-22.3%	1327.279	705.213	-46.9%	1445.804	707.766	-51.0%
MedTermLastModRangeFacets	260.932	200.115	-23.3%	652.004	510.872	-21.6%	847.848	635.331	-25.1%	2966.134	743.950	-74.9%	3060.317	745.647	-75.6%
BrowseIDOvlpRangeFacets	860.895	976.209	+13.4%	1419.693	1279.444	-9.9%	8271.185	1476.704	-82.1%	9919.502	1531.237	-84.6%	9928.280	1536.195	-84.5%
BrowseIDRangeFacets	461.967	404.593	-12.4%	799.144	625.845	-21.7%	5972.427	860.420	-85.6%	8963.973	930.259	-89.6%	9483.903	942.619	-90.1%
MedTermIDOvlpRangeFacets	294.831	235.198	-20.2%	676.861	539.088	-20.4%	897.009	671.736	-25.1%	1835.175	742.857	-59.5%	2055.182	744.089	-63.8%
MedTermIDRangeFacets	169.089	170.565	+0.9%	495.786	401.676	-19.0%	697.206	591.299	-15.2%	1026.169	690.797	-32.7%	1647.263	695.272	-57.8%

slow-J · 2026-06-18T15:17:21Z

I reran benchmarks, this time correctly using localrun, and updated the results in #16268 (comment)

epotyom

Nice change! One suggestion below

…ngeFacetCutter

…erval tracker

epotyom · 2026-06-30T06:10:23Z

+  }
+
+  /** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
+  final LongValues skipFieldValues(LeafReaderContext context) throws IOException {


Is this method a part of the single value unwrapping logic that we want to remove for now?

Nope, this is part of the core skipper path.

Oh I see now. The initial logic was:

if single-valued AND has DocValuesSkipper: use the optimized version

Then in the second revision we tried:

if single-valued: if has skipper: use the skipper-optimized version else: use the single-valued optimized version

And now we are back to the first approach?

The reason I got confused is that I thought we also used the skipper optimization for multi-valued fields, but I see now that the description explicitly calls out single-valued fields only. I wonder why that is the case? I thought Lucene supported skippers for multi-valued fields as well.

Just to clarify why I'm asking about multi valued fields: the current version still uses the unwrap-singleton optimization, but only together with the skipper. So there are basically two optimizations here, and I thought that was what you wanted to avoid?

If that’s the case, and if multi-valued fields support skippers, I’d suggest scoping this PR to the skipper optimization only, for both single- and multi-valued fields, and moving the single-valued unwrapping optimization to a follow-up PR. WDYT?

I think I'm getting confused in the naming here. I have removed the change that was not related to skipper,

The unwrapSingleton call is used to detect the segment is single-valued and to get the single-valued NumericDocValues. The other change, now removed, caused single-valued segments with no skipper to always go to the single-valued cutter when it should depend on the existing logic.

Thanks for this and other comments, I'll look into adding multi-valued skipper when I get the time and do it in this PR.

…range-facets # Conflicts: # lucene/CHANGES.txt

epotyom · 2026-07-02T10:26:42Z

  final LongValuesSource singleValues;
+
+  // Field name whose skip index is used on the single-valued path, or null when faceting a source.
+  final String skipField;


WDYT about renaming this to fieldName? skip is a bit confusing, since the field does not necessarily have a skipper even if this value is set.

epotyom · 2026-07-02T10:27:35Z

+  }
+
+  /** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
+  final LongValues skipFieldValues(LeafReaderContext context) throws IOException {


If we rename skipField to fieldName, maybe we should rename this method as well, perhaps to singletonFieldValues?

epotyom · 2026-07-02T10:27:54Z

+  /** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
+  final LongValues skipFieldValues(LeafReaderContext context) throws IOException {
+    NumericDocValues values =
+        DocValues.unwrapSingleton(DocValues.getSortedNumeric(context.reader(), skipField));


Can we deduplicate the calls to DocValues.getSortedNumeric and DocValues.unwrapSingleton? We call them both in maybeSkipper and here for the same field.

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 03d7d2a to 066c419 Compare June 17, 2026 16:03

github-actions Bot added the module:sandbox label Jun 17, 2026

github-actions Bot added this to the 10.5.0 milestone Jun 17, 2026

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 2e7144b to 0c72d5f Compare June 19, 2026 14:45

epotyom reviewed Jun 19, 2026

View reviewed changes

Comment thread ...ne/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java Outdated

epotyom reviewed Jun 19, 2026

View reviewed changes

Comment thread ...ne/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java Outdated

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 1065433 to 7db2833 Compare June 23, 2026 10:39

github-actions Bot modified the milestones: 10.5.0, 10.6.0 Jun 23, 2026

slow-J requested a review from epotyom June 23, 2026 11:29

slow-J added 4 commits June 29, 2026 11:01

Use the doc-values skip index to skip per-doc value lookups in LongRa…

1bf8688

…ngeFacetCutter

Remove redundant assertion

2a8d01f

Extend the skip-index fast path to non-dense blocks and reuse the int…

dc38fbc

…erval tracker

Remove single-valued unwrapping routing

88fe293

slow-J force-pushed the lucene-16249-skipper-range-facets branch from 7db2833 to 88fe293 Compare June 29, 2026 11:40

epotyom reviewed Jun 30, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into lucene-16249-skipper-…

725a74f

…range-facets # Conflicts: # lucene/CHANGES.txt

slow-J marked this pull request as ready for review June 30, 2026 09:59

epotyom reviewed Jul 2, 2026

View reviewed changes

slow-J marked this pull request as draft July 2, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268
slow-J wants to merge 5 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets

slow-J commented Jun 17, 2026 •

edited

Loading

Uh oh!

slow-J commented Jun 18, 2026

Uh oh!

epotyom left a comment

Uh oh!

Uh oh!

Uh oh!

epotyom Jun 30, 2026

Uh oh!

slow-J Jun 30, 2026

Uh oh!

epotyom Jul 2, 2026 •

edited

Loading

Uh oh!

epotyom Jul 2, 2026

Uh oh!

slow-J Jul 2, 2026

Uh oh!

epotyom Jul 2, 2026

Uh oh!

epotyom Jul 2, 2026

Uh oh!

epotyom Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

slow-J commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark (luceneutil)

QPS

Latency (ms) — aggregated across all iterations

Uh oh!

slow-J commented Jun 18, 2026

Uh oh!

epotyom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

epotyom Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

slow-J Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

epotyom Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

epotyom Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

slow-J Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

epotyom Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

epotyom Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

epotyom Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

slow-J commented Jun 17, 2026 •

edited

Loading

epotyom Jul 2, 2026 •

edited

Loading